AITopics | goal condition

Collaborating Authors

goal condition

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

04543a88eae2683133c1acbef5a6bf77-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-24-2026, 08:55:09 GMT

Table 5: All task variations except shape used in VLMbench. The shape variation of each task can be found in the detail descriptions of each task category. Variations Totals Values Color 25 seen:red, maroon, lime, green, blue,navy, yellow, cyan, magenta, silver, gray, olive, purple, teal, azure, violet, rose, black, white unseen: brown, gold, pink, chocolate, coral Size 5 larger, smaller, large, medium, small Relative Position 5 top, front, rear, left, right Level 3 top, middle, bottom Amount 2 fully, slightly Action Type 2 open, close Table 6: All object models used in VLMbench. The number behind the object class indicate the instance number of that class. Here, we list variations used for these tasks in Table. 5. For each demonstration, all things in the scene will change the pose at the beginning. When building an instance-level task with one variation, the other variations will also randomly change. For example, in the demonstrations of "Pick & Place objects" with "size" variation, all objects' color and relative positions, including targets and distractors, will randomly change. In the dataset, we have five types of objects, shown in Table 6. We will explain each task in detail as follows. Visualizations can be found on the project website. A.1 Pick & Place Objects Task Definition: The agent needs to distinguish the specific object to grasp and then place it into a particular container. The object can be placed anywhere with any orientation inside the container.

artificial intelligence, demonstration, scene initialization, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.48)
Information Technology > Artificial Intelligence > Robots (0.47)

Add feedback

04543a88eae2683133c1acbef5a6bf77-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsDec-27-2025, 16:19:48 GMT

container, demonstration, scene initialization, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Robots (0.47)

Add feedback

MOSAIC: A Skill-Centric Algorithmic Framework for Long-Horizon Manipulation Planning

Mishani, Itamar, Shaoul, Yorai, Likhachev, Maxim

arXiv.org Artificial IntelligenceNov-13-2025

Planning long-horizon manipulation motions using a set of predefined skills is a central challenge in robotics; solving it efficiently could enable general-purpose robots to tackle novel tasks by flexibly composing generic skills. Solutions to this problem lie in an infinitely vast space of parameterized skill sequences -- a space where common incremental methods struggle to find sequences that have non-obvious intermediate steps. Some approaches reason over lower-dimensional, symbolic spaces, which are more tractable to explore but may be brittle and are laborious to construct. In this work, we introduce MOSAIC, a skill-centric, multi-directional planning approach that targets these challenges by reasoning about which skills to employ and where they are most likely to succeed, by utilizing physics simulation to estimate skill execution outcomes. Specifically, MOSAIC employs two complementary skill families: Generators, which identify ``islands of competence'' where skills are demonstrably effective, and Connectors, which link these skill-trajectories by solving boundary value problems. By focusing planning efforts on regions of high competence, MOSAIC efficiently discovers physically-grounded solutions. We demonstrate its efficacy on complex long-horizon problems in both simulation and the real world, using a diverse set of skills including generative diffusion models, motion planning algorithms, and manipulation-specific models. Visit skill-mosaic.github.io for demonstrations and examples.

artificial intelligence, machine learning, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2504.16738

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.88)

Add feedback

ReAcTree: Hierarchical LLM Agent Trees with Control Flow for Long-Horizon Task Planning

Choi, Jae-Woo, Kim, Hyungmin, Ong, Hyobin, Jang, Minsu, Kim, Dohyung, Kim, Jaehong, Yoon, Youngwoo

arXiv.org Artificial IntelligenceNov-5-2025

Recent advancements in large language models (LLMs) have enabled significant progress in decision-making and task planning for embodied autonomous agents. However, most existing methods still struggle with complex, long-horizon tasks because they rely on a monolithic trajectory that entangles all past decisions and observations, attempting to solve the entire task in a single unified process. To address this limitation, we propose ReAcTree, a hierarchical task-planning method that decomposes a complex goal into more manageable subgoals within a dynamically constructed agent tree. Each subgoal is handled by an LLM agent node capable of reasoning, acting, and further expanding the tree, while control flow nodes coordinate the execution strategies of agent nodes. In addition, we integrate two complementary memory systems: each agent node retrieves goal-specific, subgoal-level examples from episodic memory and shares environment-specific observations through working memory. Experiments on the WAH-NL and ALFRED datasets demonstrate that ReAcTree consistently outperforms strong task-planning baselines such as ReAct across diverse LLMs. Notably, on WAH-NL, ReAcTree achieves a 61% goal success rate with Qwen 2.5 72B, nearly doubling ReAct's 31%.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.02424

Genre:

Research Report (0.64)
Workflow (0.46)

Industry:

Consumer Products & Services (0.71)
Health & Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

SutureBot: A Precision Framework & Benchmark For Autonomous End-to-End Suturing

Haworth, Jesse, Chen, Juo-Tung, Nelson, Nigel, Kim, Ji Woong, Moghani, Masoud, Finn, Chelsea, Krieger, Axel

arXiv.org Artificial IntelligenceOct-27-2025

Robotic suturing is a prototypical long-horizon dexterous manipulation task, requiring coordinated needle grasping, precise tissue penetration, and secure knot tying. Despite numerous efforts toward end-to-end autonomy, a fully autonomous suturing pipeline has yet to be demonstrated on physical hardware. We introduce SutureBot: an autonomous suturing benchmark on the da Vinci Research Kit (dVRK), spanning needle pickup, tissue insertion, and knot tying. To ensure repeatability, we release a high-fidelity dataset comprising 1,890 suturing demonstrations. Furthermore, we propose a goal-conditioned framework that explicitly optimizes insertion-point precision, improving targeting accuracy by 59\%-74\% over a task-only baseline. To establish this task as a benchmark for dexterous imitation learning, we evaluate state-of-the-art vision-language-action (VLA) models, including $π_0$, GR00T N1, OpenVLA-OFT, and multitask ACT, each augmented with a high-level task-prediction policy. Autonomous suturing is a key milestone toward achieving robotic autonomy in surgery. These contributions support reproducible evaluation and development of precision-focused, long-horizon dexterous manipulation policies necessary for end-to-end suturing. Dataset is available at: https://huggingface.co/datasets/jchen396/suturebot

artificial intelligence, dataset, demonstration, (17 more...)

arXiv.org Artificial Intelligence

2510.20965

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Health Care Technology (0.93)
Health & Medicine > Surgery (0.67)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Grounding Language Models with Semantic Digital Twins for Robotic Planning

Naeem, Mehreen, Melnik, Andrew, Beetz, Michael

arXiv.org Artificial IntelligenceJun-23-2025

We introduce a novel framework that integrates Semantic Digital Twins (SDTs) with Large Language Models (LLMs) to enable adaptive and goal-driven robotic task execution in dynamic environments. The system decomposes natural language instructions into structured action triplets, which are grounded in contextual environmental data provided by the SDT. This semantic grounding allows the robot to interpret object affordances and interaction rules, enabling action planning and real-time adaptability. In case of execution failures, the LLM utilizes error feedback and SDT insights to generate recovery strategies and iteratively revise the action plan. We evaluate our approach using tasks from the ALFRED benchmark, demonstrating robust performance across various household scenarios. The proposed framework effectively combines high-level reasoning with semantic environment understanding, achieving reliable task completion in the face of uncertainty and failure.

large language model, natural language, semantic digital twin, (16 more...)

arXiv.org Artificial Intelligence

2506.16493

Country: Europe > Germany > Bremen > Bremen (0.28)

Genre:

Research Report (0.68)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.63)

Add feedback

MORE: Mobile Manipulation Rearrangement Through Grounded Language Reasoning

Mohammadi, Mohammad, Honerkamp, Daniel, Büchner, Martin, Cassinelli, Matteo, Welschehold, Tim, Despinoy, Fabien, Gilitschenski, Igor, Valada, Abhinav

arXiv.org Artificial IntelligenceMay-7-2025

Autonomous long-horizon mobile manipulation encompasses a multitude of challenges, including scene dynamics, unexplored areas, and error recovery. Recent works have leveraged foundation models for scene-level robotic reasoning and planning. However, the performance of these methods degrades when dealing with a large number of objects and large-scale environments. To address these limitations, we propose MORE, a novel approach for enhancing the capabilities of language models to solve zero-shot mobile manipulation planning for rearrangement tasks. MORE leverages scene graphs to represent environments, incorporates instance differentiation, and introduces an active filtering scheme that extracts task-relevant subgraphs of object and region instances. These steps yield a bounded planning problem, effectively mitigating hallucinations and improving reliability. Additionally, we introduce several enhancements that enable planning across both indoor and outdoor environments. We evaluate MORE on 81 diverse rearrangement tasks from the BEHAVIOR-1K benchmark, where it becomes the first approach to successfully solve a significant share of the benchmark, outperforming recent foundation model-based approaches. Furthermore, we demonstrate the capabilities of our approach in several complex real-world tasks, mimicking everyday activities. We make the code publicly available at https://more-model.cs.uni-freiburg.de.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.03035

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.24)
North America > Canada > Ontario > Toronto (0.14)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
(2 more...)

Add feedback

PEA: Enhancing LLM Performance on Computational-Reasoning Tasks

Wang, Zi, Weng, Shiwei, Alhanahnah, Mohannad, Jha, Somesh, Reps, Tom

arXiv.org Artificial IntelligenceFeb-15-2025

Large Language Models (LLMs) have exhibited significant generalization capabilities across diverse domains, prompting investigations into their potential as generic reasoning engines. Recent studies have explored inference-time computation techniques [Welleck et al., 2024, Snell et al., 2024], particularly prompt engineering methods such as Chain-of-Thought (CoT), to enhance LLM performance on complex reasoning tasks [Wei et al., 2022]. These approaches have successfully improved model performance and expanded LLMs' practical applications. However, despite the growing focus on enhancing model capabilities through inference-time computation for complex reasoning tasks, the current literature lacks a formal framework to precisely describe and characterize the complexity of reasoning problems. This study identifies a class of reasoning problems, termed computational reasoning problems, which are particularly challenging for LLMs [Yao et al., 2023, Hao et al., 2024, Valmeekam et al., 2023], such as planning problems and arithmetic games. Informally, these problems can be accurately described using succinct programmatic representations. We propose a formal framework to describe and algorithmically solve these problems. The framework employs first-order logic, equipped with efficiently computable predicates and finite domains.

artificial intelligence, large language model, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.10938

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Industry: Transportation > Air (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Vote-Tree-Planner: Optimizing Execution Order in LLM-based Task Planning Pipeline via Voting

Zhang, Chaoyuan, Li, Zhaowei, Yuan, Wentao

arXiv.org Artificial IntelligenceFeb-13-2025

Integrating large language models (LLMs) into closed-loop robotic task planning has become increasingly popular within embodied artificial intelligence. Previous efforts mainly focused on leveraging the strong reasoning abilities of LLMs to enhance task planning performance while often overlooking task planning efficiency and executability due to repetitive queries to LLMs. This paper addresses the synergy between LLMs and task planning systems, aiming to minimize redundancy while enhancing planning effectiveness. Specifically, building upon Prog-Prompt and the high-level concept of Tree-Planner, we propose Vote-Tree-Planner. This sampling strategy utilizes votes to guide plan traversal during the decision-making process. Our approach is motivated by a straightforward observation: assigning weights to agents during decision-making enables the evaluation of critical paths before execution. With this simple vote-tree construction, our method further improves the success rate and reduces the number of queries to LLMs. The experimental results highlight that our Vote-Tree-Planner demonstrates greater stability and shows a higher average success rate and goal condition recall on the unseen dataset compared with previous baseline methods. These findings underscore the potential of the Vote-Tree-Planner to enhance planning accuracy, reliability, and efficiency in LLM-based planning systems.

large language model, machine learning, microwave, (19 more...)

arXiv.org Artificial Intelligence

2502.09749

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Goal-Conditioned Data Augmentation for Offline Reinforcement Learning

Huang, Xingshuai, Member, Di Wu, Boulet, Benoit

arXiv.org Artificial IntelligenceDec-29-2024

Offline reinforcement learning (RL) enables policy learning from pre-collected offline datasets, relaxing the need to interact directly with the environment. However, limited by the quality of offline datasets, it generally fails to learn well-qualified policies in suboptimal datasets. To address datasets with insufficient optimal demonstrations, we introduce Goal-cOnditioned Data Augmentation (GODA), a novel goal-conditioned diffusion-based method for augmenting samples with higher quality. Leveraging recent advancements in generative modeling, GODA incorporates a novel return-oriented goal condition with various selection mechanisms. Specifically, we introduce a controllable scaling technique to provide enhanced return-based guidance during data sampling. GODA learns a comprehensive distribution representation of the original offline datasets while generating new data with selectively higher-return goals, thereby maximizing the utility of limited optimal demonstrations. Furthermore, we propose a novel adaptive gated conditioning method for processing noised inputs and conditions, enhancing the capture of goal-oriented guidance. We conduct experiments on the D4RL benchmark and real-world challenges, specifically traffic signal control (TSC) tasks, to demonstrate GODA's effectiveness in enhancing data quality and superior performance compared to state-of-the-art data augmentation methods across various offline RL algorithms.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2412.20519

Genre: Research Report > New Finding (0.46)

Industry: Transportation > Ground > Road (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback